Okay, lets check out the National Student Clearinghouse enrollment data.
## # A tibble: 6 × 7
## PersonID EnrollmentBeginTimeID EnrollmentEndTi…¹ OPEID Insti…² Enrol…³ OPEID.6
## <int> <dbl> <dbl> <chr> <chr> <chr> <chr>
## 1 68 20070820 20071214 0025… Grace … L 002547
## 2 68 20080827 20081218 0023… Univer… F 002371
## 3 68 20090112 20090514 0023… Univer… F 002371
## 4 68 20090909 20091216 0034… Univer… F 003469
## 5 68 20110309 20110524 0125… Metrop… F 012586
## 6 68 20110606 20110815 0125… Metrop… F 012586
## # … with abbreviated variable names ¹EnrollmentEndTimeID, ²InstitutionName,
## # ³EnrollmentStatus
## [1] "PersonID" "EnrollmentBeginTimeID" "EnrollmentEndTimeID"
## [4] "OPEID" "InstitutionName" "EnrollmentStatus"
## [7] "OPEID.6"
The dataset has 276,249 rows and 7 columns. Here are the columns and their definitions.
Essentially, this dataset provides semester-based information - each observation is a semester/single period with the institutional information and beginning and end time of that single period (usually a semester). For example, if an individual attended South Dakota State University, attended in a “typical” fashion (fall and spring semesters), and graduated in 4 years, there would be 8 observations for that PersonID - one observation per semester.
There are a few pieces of information that I want to consolidate from this dataset;
I was going to include the length of time that a PersonID attended college, but due to differences in how institutions report enrollment periods, it didn’t seem like it would be a great indicator.
The pieces of information missing in the original dataset is location and type of institution attended.To do this we will need to join the original dataset with an IPEDS dataset using opeid.
The IPEDS data and the NSC data will only match using the first 6 digits of the IPEDS OPEID values. So I will change that in the NSC data. Then I can join.
## # A tibble: 6 × 14
## Perso…¹ Enrol…² Enrol…³ OPEID Insti…⁴ Enrol…⁵ OPEID.6 Unitid City State FIPS
## <int> <dbl> <dbl> <chr> <chr> <fct> <chr> <chr> <chr> <chr> <dbl>
## 1 68 2.01e7 2.01e7 0025… Grace … L 002547 181093 Omaha NE 31
## 2 68 2.01e7 2.01e7 0023… Univer… F 002371 174491 Sain… MN 27
## 3 68 2.01e7 2.01e7 0023… Univer… F 002371 174491 Sain… MN 27
## 4 68 2.01e7 2.01e7 0034… Univer… F 003469 219383 Siou… SD 46
## 5 68 2.01e7 2.01e7 0125… Metrop… F 012586 181303 Omaha NE 31
## 6 68 2.01e7 2.01e7 0125… Metrop… F 012586 181303 Omaha NE 31
## # … with 3 more variables: GeographicRegion <dbl>, CountyCode <chr>,
## # InstitutionSector <dbl>, and abbreviated variable names ¹PersonID,
## # ²EnrollmentBeginTimeID, ³EnrollmentEndTimeID, ⁴InstitutionName,
## # ⁵EnrollmentStatus
## [1] "PersonID" "EnrollmentBeginTimeID" "EnrollmentEndTimeID"
## [4] "OPEID" "InstitutionName" "EnrollmentStatus"
## [7] "OPEID.6" "Unitid" "City"
## [10] "State" "FIPS" "GeographicRegion"
## [13] "CountyCode" "InstitutionSector"
After joining these we now have 276,238 rows and 14 columns. This is a few less than the original enrollment document due to a few OPEIDs not aligning. But overall, the data now has the institution sector that the individual attended as well as location.
First, we will create a dataset with a column confirming their post-secondary attendance for each unique PersonID in the nsc.enrollment.ipeds.
## # A tibble: 6 × 2
## PersonID attended.ps
## <int> <chr>
## 1 68 Yes
## 2 161 Yes
## 3 676 Yes
## 4 758 Yes
## 5 786 Yes
## 6 1698 Yes
## [1] "PersonID" "attended.ps"
There are 35,056 rows and 2 columns in the dataset. The columns provide the unique PersonID along with a newly created column confirming that they attended post-secondary education.
Next we will extract the first month and year each PersonID attended a post-secondary institution which will help us figure out how long after high school graduation did they wait until they attended a post-secondary institution.
## # A tibble: 6 × 2
## PersonID first.attend.ps
## <int> <chr>
## 1 68 20070820
## 2 161 20090908
## 3 676 20090113
## 4 758 20070827
## 5 786 20100823
## 6 1698 20070904
## [1] "PersonID" "first.attend.ps"
This dataset has 35,056 rows and 2 columns. Essentially, this dataset provides each unique PersonID with the earliest begin time for post-secondary education.
Next we will determine how many different institiutions the PersonID attended during their post-secondary career.
## # A tibble: 6 × 2
## PersonID n.institutions
## <int> <int>
## 1 68 5
## 2 161 1
## 3 676 1
## 4 758 1
## 5 786 2
## 6 1698 1
## [1] "PersonID" "n.institutions"
As expected, we have 35,056 rows and 2 columns. Each PersonID in the dataset has the number of unique post-secondary institutions they attended.
Next, we will create a dataset indicating the type of college the PersonID attended first. I will use the IPEDS sector data. Here are the definitions of the institution sector;
## # A tibble: 6 × 2
## PersonID InstitutionSector
## <int> <dbl>
## 1 68 2
## 2 161 2
## 3 676 4
## 4 758 4
## 5 786 4
## 6 1698 1
## [1] "PersonID" "InstitutionSector"
As expected, we have 35,056 rows and 2 columns. This dataset now provides the sector of the first post-secondary institution they attended immediately after college.
Next we will determine thy type(s) of college(s) they attended during their post-secondary career. To do this we will first create a dataset with columns for each institution sector and a confirmation indicator on whether the PersonID attended that particular sector at some point in their career. I will then create a new category indicating that attended more than one type of institution sector. Once completed, I will be left with a dataset that has a column for each unique PersonID and what sector they attended, as well as a newly created code for “attended more than 1 type of sector”.
## # A tibble: 6 × 2
## PersonID InstitutionSector
## <int> <dbl>
## 1 68 10
## 2 161 2
## 3 676 4
## 4 758 4
## 5 786 10
## 6 1698 1
## [1] "PersonID" "InstitutionSector"
As expected, we have 35,056 rows and 2 columns. This dataset provides each unique PersonID with the institution sector they attended using the codes listed above. For PersonID’s that attended multiple sectors, they were coded as “10”.
Next we will determine whether they attended a post-secondary institution inside or outside of the planning region, outside their EDR, or outside of Minnesota. Since many of the PersonID in the dataset have attended multiple institutions, we will categorize it in the following way in order to capture the combinations of attendance;
In order to do this we will need to combine our planning region and EDR joining documents with the nsc.enrollment.ipeds dataset. We will also need to join the master dataset with it to determine the location of the PersonID’s high school graduation location. Lastly, we will need to join up the RUCA categories for counties outside of Minnesota. Then we can start the categorization process.
## # A tibble: 6 × 11
## Perso…¹ Insti…² Count…³ ps.De…⁴ grad.…⁵ grad.…⁶ grad.pr ps.co…⁷ ps.st…⁸ ps.edr
## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <fct>
## 1 161 Gustav… 27103 Urban/… EDR 8 … Town/r… Southw… 103 27 EDR 9…
## 2 161 Gustav… 27103 Urban/… EDR 8 … Town/r… Southw… 103 27 EDR 9…
## 3 161 Gustav… 27103 Urban/… EDR 8 … Town/r… Southw… 103 27 EDR 9…
## 4 161 Gustav… 27103 Urban/… EDR 8 … Town/r… Southw… 103 27 EDR 9…
## 5 161 Gustav… 27103 Urban/… EDR 8 … Town/r… Southw… 103 27 EDR 9…
## 6 161 Gustav… 27103 Urban/… EDR 8 … Town/r… Southw… 103 27 EDR 9…
## # … with 1 more variable: ps.pr <fct>, and abbreviated variable names
## # ¹PersonID, ²InstitutionName, ³CountyCode, ⁴ps.Dem_Desc, ⁵grad.edr,
## # ⁶grad.ruca, ⁷ps.countyfp, ⁸ps.statefp
## [1] "PersonID" "InstitutionName" "CountyCode" "ps.Dem_Desc"
## [5] "grad.edr" "grad.ruca" "grad.pr" "ps.countyfp"
## [9] "ps.statefp" "ps.edr" "ps.pr"
This joined dataset gives us 226,904 rows and 11 columns. The columns beginning with “ps” are the ruca category and regions of the post-secondary institution attended. The columns beginning with “grad” are the ruca category and regions of the highschool from which the PersonID graduated.
One thing that’s important is to realize that joining the nsc.enrollment dataset with the SW graduates dataset means we will have some NAs. Some of the students listed in the nsc.enrollment dataset aren’t in the SW graduates dataset since they may have graduated before 2008 or didn’t actually meet the criteria in the SW graduates dataset. Therefore, as we move forward, I will need to make sure to remove those NAs.
From here we can start creating new columns beginning with the RUCA category. to create this column we will gather each PersonID to examine whether or they attended a post-secondary institution in the same RUCA category, or if they attended multiple post-secondary insitutions with one institution in the same category and another not the same.
## # A tibble: 6 × 2
## PersonID ps.in.same.ruca
## <dbl> <chr>
## 1 161 Outside RUCA
## 2 676 In same RUCA
## 3 786 Outside RUCA
## 4 2292 Inside and outside same RUCA
## 5 2782 Outside RUCA
## 6 2809 Outside RUCA
## [1] "PersonID" "ps.in.same.ruca"
So there are fewer observations in this dataset than previous subsets. Why? This is due to dropping and PersonID that wasn’t in the Southwest graduate dataset. I did not do that for the previous subsets. However, those previous subsets will be filtered down once we join it with the master dataset.
This dataset provides the PersonID and whether they attended a post secondary institution in a location with the same, outside, or both outside and inside (if attended multiple post-secondary institutions) RUCA categories of their high school from which they graduated. There are 29,371 rows and 2 columns.
Up next we will create a dataset indicating whether a PersonID that graduated from a Southwest MN high school attended a post-secondary insitution in their high school’s EDR.
## # A tibble: 6 × 2
## PersonID ps.in.same.edr
## <dbl> <chr>
## 1 161 Outside EDR
## 2 676 Outside EDR
## 3 786 Outside EDR
## 4 2292 Inside and outside same EDR
## 5 2782 Inside and outside same EDR
## 6 2809 Outside EDR
## [1] "PersonID" "ps.in.same.edr"
As expected there are 29,371 rows and 2 columns.
Next we will determine which Southwest MN graduates attended a post-secondary institution in the same planning region (Southwest planning region).
## # A tibble: 6 × 2
## PersonID ps.in.same.pr
## <dbl> <chr>
## 1 161 In same PR
## 2 676 Outside PR
## 3 786 Outside PR
## 4 2292 Inside and outside same PR
## 5 2782 Inside and outside same PR
## 6 2809 Outside PR
## [1] "PersonID" "ps.in.same.pr"
As expected, there are 29,371 rows and 2 columns. This dataset provides whether the post-secondary institution(s) attended were in the same planning region as the high school from whichh they graduates, outside of the planning region, or both (attended multiple institutions).
Next we want to see how many of the students leave the state to attend post-secondary education.
## # A tibble: 6 × 2
## PersonID ps.in.same.pr
## <dbl> <chr>
## 1 161 In same PR
## 2 676 Outside PR
## 3 786 Outside PR
## 4 2292 Inside and outside same PR
## 5 2782 Inside and outside same PR
## 6 2809 Outside PR
## [1] "PersonID" "ps.in.same.pr"
As expected we have 29,371 rows and 2 columns. This dataset provides each distinct PersonID with whether they attended post secondary institutions inside MN, outside MN, or both.
Okay, now it’s time to join all of these with the master dataset. I will also create a new column that confirms whether they attended a college immediately (within the first year) after graduating high school.
## # A tibble: 6 × 47
## Perso…¹ K12Or…² grad.…³ Gender Limit…⁴ Homel…⁵ econo…⁶ pseo.…⁷ Speci…⁸ non.e…⁹
## <dbl> <dbl> <dbl> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 161 77452 2009 M N N 0 1 0 0
## 2 641 38385 2009 M N N 1 0 0 0
## 3 676 55598 2008 M N N 0 0 1 0
## 4 786 56633 2010 F N N 0 0 0 0
## 5 1162 181770 2019 M N N 1 0 0 1
## 6 2292 55659 2010 F N N 0 1 0 0
## # … with 37 more variables: DistrictName <chr>, county.name <chr>,
## # countyfp <chr>, Dem_Desc <chr>, edr <chr>, n.years.attended <dbl>,
## # ACTCompositeScore <dbl>, ap.exam <dbl>, total.cte.courses.taken <dbl>,
## # cte.careerfield.1 <dbl>, cte.careerfield.2 <dbl>, cte.careerfield.3 <dbl>,
## # cte.careerfield.4 <dbl>, cte.careerfield.5 <dbl>, cte.careerfield.6 <dbl>,
## # cte.careerfield.7 <dbl>, cte.careerfield.8 <dbl>, cte.careerfield.9 <dbl>,
## # cte.careerfield.NULL <dbl>, cte.achievement <chr>, …
## [1] "PersonID"
## [2] "K12OrganizationID"
## [3] "grad.year"
## [4] "Gender"
## [5] "LimitedEnglishProficiencyIndicator"
## [6] "HomelessIndicator"
## [7] "economic.status"
## [8] "pseo.participant"
## [9] "SpecialEdStatus"
## [10] "non.english.home"
## [11] "DistrictName"
## [12] "county.name"
## [13] "countyfp"
## [14] "Dem_Desc"
## [15] "edr"
## [16] "n.years.attended"
## [17] "ACTCompositeScore"
## [18] "ap.exam"
## [19] "total.cte.courses.taken"
## [20] "cte.careerfield.1"
## [21] "cte.careerfield.2"
## [22] "cte.careerfield.3"
## [23] "cte.careerfield.4"
## [24] "cte.careerfield.5"
## [25] "cte.careerfield.6"
## [26] "cte.careerfield.7"
## [27] "cte.careerfield.8"
## [28] "cte.careerfield.9"
## [29] "cte.careerfield.NULL"
## [30] "cte.achievement"
## [31] "avg.cte.intensity"
## [32] "english.learner"
## [33] "id"
## [34] "MCA.M"
## [35] "MCA.R"
## [36] "MCA.S"
## [37] "sat.taken"
## [38] "attended.ps"
## [39] "attended.ps.years.hsgrad"
## [40] "attended.ps.within.first.year.hsgrad"
## [41] "n.institutions"
## [42] "first.InstitutionSector"
## [43] "InstitutionSector"
## [44] "ps.in.same.ruca"
## [45] "ps.in.same.edr"
## [46] "ps.in.same.pr"
## [47] "ps.in.MN"
As expected, we have the same number of rows as the original master dataset - 38,154 rows. Here are the explanations of all the new columns added to the master dataset.
Lets summarize the percentage of students that attended a post-secondary institution and compare across RUCA categories and regions.
Below is the percentage of students that attended a post-secondary institution from the entire dataset. 77% of the PersonIDs in the dataset attended a post-secondary institution at some point between 2007 and 2019.
Now lets check to see if this percentage is statistically significantly different by RUCA group.
The crosstabs clearly indicate that the percentage of students that attend a post-secondary institution differs significantly by RUCA category - p-value = 1.802681e-23.
In particular, the percentage of students that attend a post-secondary institution from entirely rural graduates (82%) is significantly higher than the town/rural mix and urban/town/rural mix categories (76% and 78%, respectively).
##
##
## Cell Contents
## |-------------------------|
## | N |
## | Expected N |
## | N / Row Total |
## |-------------------------|
##
##
## Total Observations in Table: 38154
##
##
## | master.9$attended.ps
## master.9$Dem_Desc | No | Yes | Row Total |
## ---------------------|-----------|-----------|-----------|
## Entirely rural | 996 | 4535 | 5531 |
## | 1273.229 | 4257.771 | |
## | 0.180 | 0.820 | 0.145 |
## ---------------------|-----------|-----------|-----------|
## Town/rural mix | 6460 | 20158 | 26618 |
## | 6127.428 | 20490.572 | |
## | 0.243 | 0.757 | 0.698 |
## ---------------------|-----------|-----------|-----------|
## Urban/town/rural mix | 1327 | 4678 | 6005 |
## | 1382.343 | 4622.657 | |
## | 0.221 | 0.779 | 0.157 |
## ---------------------|-----------|-----------|-----------|
## Column Total | 8783 | 29371 | 38154 |
## ---------------------|-----------|-----------|-----------|
##
##
## Statistics for All Table Factors
##
##
## Pearson's Chi-squared test
## ------------------------------------------------------------
## Chi^2 = 104.7404 d.f. = 2 p = 1.802681e-23
##
##
##
Now lets check to see if this percentage is statistically significantly different by edr.
The crosstabs clearly indicate that the percentage of students that attend a post-secondary institution differs significantly by edr - p-value = 1.704582e-10.
In particular, the percentage of students that attend a post-secondary institution from EDR 8 - Southwest (79%) is significantly higher than EDR 6E and 6W (76% and 76%, respectively).
##
##
## Cell Contents
## |-------------------------|
## | N |
## | Expected N |
## | N / Row Total |
## |-------------------------|
##
##
## Total Observations in Table: 38154
##
##
## | master.9$attended.ps
## master.9$edr | No | Yes | Row Total |
## -------------------------------|-----------|-----------|-----------|
## EDR 6E- Southwest Central | 3420 | 10521 | 13941 |
## | 3209.200 | 10731.800 | |
## | 0.245 | 0.755 | 0.365 |
## -------------------------------|-----------|-----------|-----------|
## EDR 6W- Upper Minnesota Valley | 1754 | 5610 | 7364 |
## | 1695.183 | 5668.817 | |
## | 0.238 | 0.762 | 0.193 |
## -------------------------------|-----------|-----------|-----------|
## EDR 8 - Southwest | 3609 | 13240 | 16849 |
## | 3878.617 | 12970.383 | |
## | 0.214 | 0.786 | 0.442 |
## -------------------------------|-----------|-----------|-----------|
## Column Total | 8783 | 29371 | 38154 |
## -------------------------------|-----------|-----------|-----------|
##
##
## Statistics for All Table Factors
##
##
## Pearson's Chi-squared test
## ------------------------------------------------------------
## Chi^2 = 44.98506 d.f. = 2 p = 1.704582e-10
##
##
##
Now lets take a look at the number of years between high school graduation and attending post-secondary. We will begin with the total number of students before diving into differences across RUCA categories and regions.
A huge majority of students that attended a post secondary institution waited less than 1 year after graduating high school (87%)
Next lets check to see if there are any differences in the numbers of years between high school and attending post secondary by RUCA category.
The ANOVA table indicates there are significant differences between the means across RUCA categories with a pvalue = .00051.
In particular, it shows that the average number of years that students wait between high school and post secondary increases as a high school becomes more urban. The difference between those means is statistically significant.
Students that graduated from an entirely rural high school waited, on average, .24 years before attending post secondary, vs. .29 and .31 for students from town/rural mix and urban/town/rural mix schools.
## Df Sum Sq Mean Sq F value Pr(>F)
## Dem_Desc 2 14 7.079 7.583 0.00051 ***
## Residuals 29368 27416 0.934
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = attended.ps.years.hsgrad ~ Dem_Desc, data = years.between.grad.ps.anova.ruca)
##
## $Dem_Desc
## diff lwr upr p adj
## Town/rural mix-Entirely rural 0.05600619 0.01878897 0.09322342 0.0012228
## Urban/town/rural mix-Entirely rural 0.07116574 0.02397561 0.11835587 0.0011888
## Urban/town/rural mix-Town/rural mix 0.01515955 -0.02159037 0.05190946 0.5979123
Next lets check to see if there are any differences in the numbers of years between high school and attending post secondary by edr.
The ANOVA table indicates there are significant differences between the means across RUCA categories with a pvalue = 3.36e-12.
In particular, it shows that the average number of years that students wait between high school and post secondary increases as a high school becomes more urban. The difference between those means is statistically significant.
Students that graduated from an entirely rural high school waited, on average, .24 years before attending post secondary, vs. .29 and .31 for students from town/rural mix and urban/town/rural mix schools.
In particular, it shows that each of the means were statistically different from each other. EDr 8 had the lowest average with .25 years, followed by EDR 6W with .29 and EDR 6E with .34.
## Df Sum Sq Mean Sq F value Pr(>F)
## edr 2 49 24.654 26.44 3.36e-12 ***
## Residuals 29368 27381 0.932
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = attended.ps.years.hsgrad ~ edr, data = years.between.grad.ps.anova.edr)
##
## $edr
## diff lwr upr p adj
## EDR 6W-EDR 6E -0.05252393 -0.08993598 -0.015111881 0.0028754
## EDR 8 -EDR 6E -0.09170179 -0.12125813 -0.062145446 0.0000000
## EDR 8 -EDR 6W -0.03917785 -0.07522916 -0.003126544 0.0292602
Next we will summarize the number of colleges attended by producing the summary statistics and distribution.
The table and distribution chart below show that a large majority (60%) of individuals attended only one post secondary institutions followed by 28% attending two.
I’m not sure it’s all that important knowing whether there are differences in the percentage of students and the number of institutions they’ve attended across RUCA categories or regions. So we will skip that analysis for now.
I think this will be very interesting - we want to see what they breakdown is of students attending different types of colleges. We will start by summarizing the total dataset before looking at differences across RUCA categories and regions.
The chart below provides the percentage of students in the entire dataset that attended each institution sector. By far, a huge majority attend either a public 4-year or public 2-year.
Next we will check to see if those percentages are significantly different by RUCA category of the graduates high school.
The crosstabs indicate that there is a significant difference in the percentage of students attending different institution sectors depending on the RUCA category of their high school. The p-value was 3.35711e-23.
There seem to be significant differences in attendance across all four institution sectors.
Since there are only a few sectors that have a significant number of students attended, we will reduce the sectors analyzed for differences to 4;
##
##
## Cell Contents
## |-------------------------|
## | N |
## | Expected N |
## | N / Row Total |
## |-------------------------|
##
##
## Total Observations in Table: 29296
##
##
## | first.college.sector.ct.1.ruca$first.InstitutionSector
## first.college.sector.ct.1.ruca$Dem_Desc | 1 | 2 | 3 | 4 | Row Total |
## ----------------------------------------|-----------|-----------|-----------|-----------|-----------|
## Entirely rural | 2134 | 475 | 34 | 1878 | 4521 |
## | 1988.586 | 610.495 | 49.229 | 1872.690 | |
## | 0.472 | 0.105 | 0.008 | 0.415 | 0.154 |
## ----------------------------------------|-----------|-----------|-----------|-----------|-----------|
## Town/rural mix | 8559 | 2766 | 218 | 8565 | 20108 |
## | 8844.610 | 2715.294 | 218.953 | 8329.143 | |
## | 0.426 | 0.138 | 0.011 | 0.426 | 0.686 |
## ----------------------------------------|-----------|-----------|-----------|-----------|-----------|
## Urban/town/rural mix | 2193 | 715 | 67 | 1692 | 4667 |
## | 2052.805 | 630.211 | 50.818 | 1933.166 | |
## | 0.470 | 0.153 | 0.014 | 0.363 | 0.159 |
## ----------------------------------------|-----------|-----------|-----------|-----------|-----------|
## Column Total | 12886 | 3956 | 319 | 12135 | 29296 |
## ----------------------------------------|-----------|-----------|-----------|-----------|-----------|
##
##
## Statistics for All Table Factors
##
##
## Pearson's Chi-squared test
## ------------------------------------------------------------
## Chi^2 = 118.5052 d.f. = 6 p = 3.35711e-23
##
##
##
Now lets check to see if there are differences by EDR.
The crosstabs indicated a relationship between the institution sector of a students first college and the EDR of their high school. The p-value was 7.986177e-52.
All four institution sectors measured seemed to have significant differences.
##
##
## Cell Contents
## |-------------------------|
## | N |
## | Expected N |
## | N / Row Total |
## |-------------------------|
##
##
## Total Observations in Table: 29296
##
##
## | first.college.sector.ct.1.edr$first.InstitutionSector
## first.college.sector.ct.1.edr$edr | 1 | 2 | 3 | 4 | Row Total |
## ----------------------------------|-----------|-----------|-----------|-----------|-----------|
## EDR 6E- Southwest Central | 4253 | 1727 | 158 | 4354 | 10492 |
## | 4614.961 | 1416.792 | 114.246 | 4346.000 | |
## | 0.405 | 0.165 | 0.015 | 0.415 | 0.358 |
## ----------------------------------|-----------|-----------|-----------|-----------|-----------|
## EDR 6W- Upper Minnesota Valley | 2321 | 652 | 53 | 2573 | 5599 |
## | 2462.750 | 756.064 | 60.967 | 2319.220 | |
## | 0.415 | 0.116 | 0.009 | 0.460 | 0.191 |
## ----------------------------------|-----------|-----------|-----------|-----------|-----------|
## EDR 8 - Southwest | 6312 | 1577 | 108 | 5208 | 13205 |
## | 5808.289 | 1783.144 | 143.787 | 5469.780 | |
## | 0.478 | 0.119 | 0.008 | 0.394 | 0.451 |
## ----------------------------------|-----------|-----------|-----------|-----------|-----------|
## Column Total | 12886 | 3956 | 319 | 12135 | 29296 |
## ----------------------------------|-----------|-----------|-----------|-----------|-----------|
##
##
## Statistics for All Table Factors
##
##
## Pearson's Chi-squared test
## ------------------------------------------------------------
## Chi^2 = 253.3248 d.f. = 6 p = 7.986177e-52
##
##
##
Up next is determining how many students attended a post-secondary institution located in a county with the same RUCA category as their high school. First we will look at the percentages of the total dataset and then we will break it up.
The chart below shows that nearly half (48.2%) of students graduating from a SW MN high school attended a post secondary institution that was not located in a county with the same RUCA category as their high school.
Next, lets check to see if the type of RUCA category they graduated from is related to whether the college they attend is in the same or different RUCA category.
The crosstabs indicate that there is a relationship in the location of the individuals high school graduation and the RUCA category of their post-secondary institution(s). The p-valu was 0.
As expected, individuals who graduate from a high school in an entirely rural county category were significantly more likely to attend a post secondary institution that wasn’t entirely rural.
##
##
## Cell Contents
## |-------------------------|
## | N |
## | Expected N |
## | N / Row Total |
## |-------------------------|
##
##
## Total Observations in Table: 29371
##
##
## | ps.same.ruca.ct.ruca$ps.in.same.ruca
## ps.same.ruca.ct.ruca$Dem_Desc | In same RUCA | Inside and outside same RUCA | Outside RUCA | Row Total |
## ------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
## Entirely rural | 0 | 2 | 4533 | 4535 |
## | 1529.063 | 819.576 | 2186.361 | |
## | 0.000 | 0.000 | 1.000 | 0.154 |
## ------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
## Town/rural mix | 8885 | 4285 | 6988 | 20158 |
## | 6796.659 | 3643.004 | 9718.337 | |
## | 0.441 | 0.213 | 0.347 | 0.686 |
## ------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
## Urban/town/rural mix | 1018 | 1021 | 2639 | 4678 |
## | 1577.278 | 845.420 | 2255.302 | |
## | 0.218 | 0.218 | 0.564 | 0.159 |
## ------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
## Column Total | 9903 | 5308 | 14160 | 29371 |
## ------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
##
##
## Statistics for All Table Factors
##
##
## Pearson's Chi-squared test
## ------------------------------------------------------------
## Chi^2 = 6685.248 d.f. = 4 p = 0
##
##
##
Now we want to see how many students attend a college that is located in the same EDR as their high school. We will start with the total dataset and then perform crosstabs to determine if the EDR of their high school is related to whether they attend a college in the same EDR.
The chart below shows that nearly 80% of students graduating from a SW MN High School attend a college outside of their high school’s EDR.
Next, lets check to see if the type of EDR they graduated from is related to whether the college they attend is in the same or different EDR.
The crosstabs indicate that there is a relationship in the location of the individuals high school graduation and the EDR of their post-secondary institution(s). The p-value was 0.
Individuals from EDR 6E were significantly more likely to attend a post-secondary institution in the same EDR, and more likely to attend multiple colleges that were inside and outside the same EDR, compared to graduates from the other EDRs.
In addition, graduates from EDR 8 and 6W were significantly more likely to attend a post secondary institution outside of their EDR compared to EDR 6E.
##
##
## Cell Contents
## |-------------------------|
## | N |
## | Expected N |
## | N / Row Total |
## |-------------------------|
##
##
## Total Observations in Table: 29371
##
##
## | ps.same.edr.ct.edr$ps.in.same.edr
## ps.same.edr.ct.edr$edr | In same EDR | Inside and outside same EDR | Outside EDR | Row Total |
## -------------------------------|-----------------------------|-----------------------------|-----------------------------|-----------------------------|
## EDR 6E- Southwest Central | 2041 | 1918 | 6562 | 10521 |
## | 1095.766 | 1245.498 | 8179.736 | |
## | 0.194 | 0.182 | 0.624 | 0.358 |
## -------------------------------|-----------------------------|-----------------------------|-----------------------------|-----------------------------|
## EDR 6W- Upper Minnesota Valley | 392 | 472 | 4746 | 5610 |
## | 584.283 | 664.123 | 4361.593 | |
## | 0.070 | 0.084 | 0.846 | 0.191 |
## -------------------------------|-----------------------------|-----------------------------|-----------------------------|-----------------------------|
## EDR 8 - Southwest | 626 | 1087 | 11527 | 13240 |
## | 1378.951 | 1567.379 | 10293.671 | |
## | 0.047 | 0.082 | 0.871 | 0.451 |
## -------------------------------|-----------------------------|-----------------------------|-----------------------------|-----------------------------|
## Column Total | 3059 | 3477 | 22835 | 29371 |
## -------------------------------|-----------------------------|-----------------------------|-----------------------------|-----------------------------|
##
##
## Statistics for All Table Factors
##
##
## Pearson's Chi-squared test
## ------------------------------------------------------------
## Chi^2 = 2357.315 d.f. = 4 p = 0
##
##
##
Now we want to see how many students attend a college that is located in the same planning region as their high school.
The chart below shows that nearly 70% of students graduating from SW MN High schools leave the region to attend post secondary education.
Now we want to see how many students stay or leave Minnesota to attend college.
The chart below shows that 59% of students graduating from SW MN high schools attend a post secondary institution in Minnesota.
Next, lets check to see if the RUCA category of their high school is related to whether they attend a college inside or outside Minnesota.
The crosstabs below indicate that there is a relationship between the RUCA category of a student’s high school and whether they attend a college inside or outside of Minnesota. The p-value was 1.10919e-31.
The primary difference is that students who graduate from an entirely rural high school were significantly less likely to attend a college in Minnesota and attend one outside of Minnesota compared to students from other RUCA categories.
##
##
## Cell Contents
## |-------------------------|
## | N |
## | Expected N |
## | N / Row Total |
## |-------------------------|
##
##
## Total Observations in Table: 29371
##
##
## | ps.in.MN.ct.ruca$ps.in.MN
## ps.in.MN.ct.ruca$Dem_Desc | In MN | Inside and outside MN | Outside MN | Row Total |
## --------------------------|-----------------------|-----------------------|-----------------------|-----------------------|
## Entirely rural | 2339 | 804 | 1392 | 4535 |
## | 2677.674 | 753.492 | 1103.834 | |
## | 0.516 | 0.177 | 0.307 | 0.154 |
## --------------------------|-----------------------|-----------------------|-----------------------|-----------------------|
## Town/rural mix | 12103 | 3363 | 4692 | 20158 |
## | 11902.218 | 3349.257 | 4906.525 | |
## | 0.600 | 0.167 | 0.233 | 0.686 |
## --------------------------|-----------------------|-----------------------|-----------------------|-----------------------|
## Urban/town/rural mix | 2900 | 713 | 1065 | 4678 |
## | 2762.108 | 777.251 | 1138.641 | |
## | 0.620 | 0.152 | 0.228 | 0.159 |
## --------------------------|-----------------------|-----------------------|-----------------------|-----------------------|
## Column Total | 17342 | 4880 | 7149 | 29371 |
## --------------------------|-----------------------|-----------------------|-----------------------|-----------------------|
##
##
## Statistics for All Table Factors
##
##
## Pearson's Chi-squared test
## ------------------------------------------------------------
## Chi^2 = 151.2306 d.f. = 4 p = 1.10919e-31
##
##
##
Next, lets check to see if the EDR of their high school is related to whether they attend a college inside or outside Minnesota.
The crosstabs below indicate that there is a relationship between the EDR of a student’s high school and whether they attend a college inside or outside of Minnesota. The p-value was 1.161384e-264.
The primary difference is that students who graduate from EDR 6E are significantly more likely to attend a college in Minnesota and significantly less likely to attend a college outside of MN.
##
##
## Cell Contents
## |-------------------------|
## | N |
## | Expected N |
## | N / Row Total |
## |-------------------------|
##
##
## Total Observations in Table: 29371
##
##
## | ps.in.MN.ct.edr$ps.in.MN
## ps.in.MN.ct.edr$edr | In MN | Inside and outside MN | Outside MN | Row Total |
## -------------------------------|-----------------------|-----------------------|-----------------------|-----------------------|
## EDR 6E- Southwest Central | 7524 | 1460 | 1537 | 10521 |
## | 6212.086 | 1748.067 | 2560.847 | |
## | 0.715 | 0.139 | 0.146 | 0.358 |
## -------------------------------|-----------------------|-----------------------|-----------------------|-----------------------|
## EDR 6W- Upper Minnesota Valley | 3199 | 943 | 1468 | 5610 |
## | 3312.404 | 932.103 | 1365.493 | |
## | 0.570 | 0.168 | 0.262 | 0.191 |
## -------------------------------|-----------------------|-----------------------|-----------------------|-----------------------|
## EDR 8 - Southwest | 6619 | 2477 | 4144 | 13240 |
## | 7817.510 | 2199.830 | 3222.660 | |
## | 0.500 | 0.187 | 0.313 | 0.451 |
## -------------------------------|-----------------------|-----------------------|-----------------------|-----------------------|
## Column Total | 17342 | 4880 | 7149 | 29371 |
## -------------------------------|-----------------------|-----------------------|-----------------------|-----------------------|
##
##
## Statistics for All Table Factors
##
##
## Pearson's Chi-squared test
## ------------------------------------------------------------
## Chi^2 = 1227.65 d.f. = 4 p = 1.61384e-264
##
##
##
The last piece here is to provide a map of the United States to see where students are going for college.
Outside of Minnesota, SW MN graduates attend colleges in South Dakota at the highest rate - 18.1%. This is followed by North Dakota at 6.2%.